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ABSTRACT 

In this paper, we consider robust system identification under 
sparse outliers and random noises. In our problem, system 
parameters are observed through a Toeplitz matrix. All ob- 
servations are subject to random noises and a few are cor- 
rupted with outliers. We reduce this problem of system iden- 
tification to a sparse error correcting problem using a Toeplitz 
structured real-numbered coding matrix. We prove the perfor- 
mance guarantee of Toeplitz structured matrix in sparse er- 
ror correction. Thresholds on the percentage of correctable 
errors for Toeplitz structured matrices are also established. 
When both outliers and observation noise are present, we have 
shown that the estimation error goes to asymptotically as 
long as the probability density function for observation noise 
is not "vanishing" around 0. 

Index Terms — system identification, ti minimization, 
Toeplitz matrix, compressed sensing, error correction 

1. INTRODUCTION 

In system identification, an unknown system state x e i?'" is 
often observed through a Toeplitz matrix H e jinxm, -> 
m), namely 

where y = {yi,y2, ■■■,yn)'^ is the system output and the 
Toeplitz matrix H is equal to 
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(1.1) 



with hi, —m + 2 < i < n, being the system input assumed to 
be an i.i.d. A^(0, 1) Gaussian random sequence. 

If there is no interference or noise in the observation 
y, one can then simply recover x from a matrix inversion. 
However, in applications, all observations y are corrupted by 
noises and a few elements can be exposed to large-magnitude 



gross errors or outliers. Such outliers can happen with the 
failure of measurement devices, measurement communica- 
tion errors and the interference of adversary parties. Mathe- 
matically, when both additive observation noise and outliers 
are present, the observation y can be written as 



y = i?x 



w, 



(1.2) 



where e is a sparse outlier vector with k n nonzero ele- 
ments, and w is a measurement noise vector with each ele- 
ment being i.i.d. random variables. We further assume m is 
fixed, which is often the case in system identifications 16|. 

If only random measurement errors are present, the least- 
square solutions generally provide an asymptotically good es- 
timate. However, the least-square estimate breaks down in the 
presence of outhers. Thus, it is necessary to protect the esti- 
mates from both random noise and outliers. Research along 
this direction has attracted a significant amount of attention, 
for example, |[T] |3] |4] |5] |6] I?) . In particular for reducing the 
effects of outliers, the least absolute deviation estimate (£i 
minimization) was proposed and studied [2 . 8, 9[jJ,Q, 24J. In- 
stead of searching for all the (^) possibilities for the locations 
of outliers, [|2] |8] |9] proposed to minimize the least absolute 
deviation: 



i^xll 



(1.3) 



Under the assumption that the error e + w is an i.i.d. random 
sequence with a common density which has median zero and 
is continuous and positive in the neighborhood of zero, the 
difference between the unknown x and its estimate is asymp- 
totically Gaussian of zero mean 12J. The problem is that the 
assumption of a common density on the outliers is seldom 
satisfied in reality. Also, median zero on e + w is restrictive. 

In lISllEIIOllJlllTSll^Eij, each element of (or the non- 
singular {n — m) X n matrix A such that AH = 0) is assumed 
to be i.i.d. random variables following a certain distribution, 
for example, Gaussian distribution or Bernoulli distribution. 
These types of matrices have been shown to obey certain con- 
ditions such as restricted isometry conditions |8| so that ( 11.31 1 
can correctly recover x when there are only outliers present; 
and can recover x approximately when both outliers and mea- 
surement noise exist. However, in the system identification 
problem, H has a natural Toeplitz structure and the elements 



of H are correlated. The natural question is whether (11.31 1 
also provides performance guarantee for recovering x with a 
Toeplitz matrix. We provide a positive answer in this paper 

Though the elements of Toeplitz matrices are correlated, 
we have shown that Toeplitz structured matrices also enable 
the successful recovery of x by using ( 11.3b . The main contri- 
bution of this paper is the establishment of the performance 
guarantee of Toeplitz structured matrices in sparse error cor- 
rection. In particular, we calculated the thresholds on the 
sparsity k such that an error vector with no more than k 
nonzero elements can be recovered using ( ll.3l l. When both 
outliers and observation noise are present, we have shown 
that the estimation error goes to asymptotically as long as 
the probability density function for observation noise is not 
"vanishing" around 0. 

There is a well known duality between compressed sens- 
ing fTP, TJl and sparse error detection fS*, "91 : the null space 
of sensing matrices in compressed sensing corresponds to the 
tall matrix H in sparse error corrections. Toeplitz and circu- 
lant matrices have been studied in compressed sensing in sev- 
eral papers |17||18||19|. In these papers, it has been shown 
that Toeplitz matrices are good for recovering sparse vectors 
from undersampled measurements. In contrast, in our model 
of sparse error correction, the signal itself is not sparse and the 
linear system involved is overdetermined rather underdeter- 
mined. Also, the null space of a Toeplitz matrix does not nec- 
essarily correspond to another Toeplitz matrix; so the problem 
studied in this paper is essentially different from those studied 

in iniiiiaiiii- 

The rest of this paper is organized as follows. In Section|2l 
we derive performance bounds on the number of outliers we 
can correct when only outliers are present. In Section [3] we 
derive the estimation of system parameters when both gross 
errors and observation noises are present. In Section 21 we 
provide the numerical results and conclude our paper by dis- 
cussing extensions and future directions. 

2. WITH ONLY OUTLIERS 

We establish one main result regarding the threshold of suc- 
cessful recovery of -minimization using Toeplitz matrix. 

Theorem 2.1 Let H be an n x m Toeplitz matrix as in ( 17.71 ), 

where m is a fixed positive integer and hi, —m + 2 < i < n 
are i.i.d. N{0, 1) Gaussian random variables. Suppose that 
y — Hx + e, where e is a sparse vector with no more than 
k nonzero elements. Then there exists a constant ci > 
and a constant /3 > such that, with probability 1 — e~^^" 
as n oo, the n x m Toeplitz matrix H has the follow- 
ing property: for every x € 7?™ and every error e with 
its support K satisfying \K\ — k < f3n, x is the unique 
solution to ( I7.3I ). Here the constant < /3 < 1 can be 
taken as any number such that for some constant /i > and 
0<S<1,(3 log(l//3) + (1 - /?) log(T^) + m/3[log(2) + 



^ + log($(MV^))] + {^,-f3) [log(2) + 1^2(1 - <5)2 + 

log(l - <P{p{l - S)))] < 0, where = ^ /_^^ e""^ dx 
is the cumulative distribution function far the standard Gaus- 
sian random variable. 

Remark: The derived correctable fraction of errors /3 de- 
pends on the system dimension m. In the rest of this section, 
we outline the strategy to prove Theorem l2.1l Our derivation 
is based on checking the following now-well-known theorem 
for £i minimization (see 1251 . for example). 

Theorem 2.2 ( 17.31 ) can recover the correct state x whenever 
||e||o < k, if and only if for every vector z € 7?™ ^ 0, 
||(77z)x||i < \\{Hz)^\\i for every subset K C {l,2,...,n} 
with cardinality \K\ = k, where K = {1,2, ...,n} \ K. 

The difficulty of checking this condition is that the elements 
of 77 are not independent random variables and that the con- 
dition must hold for every vector in the subspace generated 
by 77. We adopt the following strategy of discretizing the 
subspace generated by 77,see | lOl [16] |20l . It is obvious that 
we only need to consider 77z for z G 7?™ with ||z||2 — 1. 
We then pick a finite set V ~ {vi, ...,vn} called 7-net on 
{z|||z||2 = 1} for a constant 7 > 0: in a 7-net, for every 
point z from {z|||z||2 = 1}, there is a G V such that 
II 2 — w; II 2 < 7- We subsequently establish the property in 
Theorem 12.21 for all the points in 7-net V before extending 
the results to every point 77z, where ||z||2 = 1. 

Following this strategy, we establish Lemmas l2.3ll2.4l and 
12.51 Lemma |23] then directly implies Theorem 12.11 Most 
proofs are listed in [23] for the sake of space. We first show 
the concentration of measure phenomenon for Hz, where z £ 
7?™ is a single vector with ||z||2 = 1. 

Lemma 2.3 Let \\z\\2 — 1. For any e > 0, there exists a 
constant C2 > such that when n is large enough, with prob- 

ability l-2e~''^ i^+^-^>"- , it holds that (1- e)S < \\Hz\\i < 
(1 + e)5', where S — nE{\X\} and X is a random variable 
following the Gaussian distribution N{0, 1). 

Lemma 2.4 Let \\z\\2 = 1 and < 6 < 1 be a constant. 
Then there exists a threshold (3 e (0, 1) and a constant 
C3 > (depending on m and l3), such that, with a probability 
1 — e^"^^", for all subsets K C {1, 2, n\ with cardinality 

\\{Hz)Kh<]^^\\Hz\\,. 

By a union bound on the size of 7-net, Lemma [23] and |24l in- 
dicate that with overwhelming probability the recovery con- 
dition in Theorem 122] holds for the discrete points on 7-net. 
The following lemma extends the result to {z|||z||2 = 1}. 

Lemma 2.5 There exist a constant C4 > such that when 
n is large enough, with probability 1 — e^"^*", the Toeplitz 



matrix H has the following property: for every z € R™ and 
every subset K C with \K\ < l3n, ^ |(iJz)i| ~ 

^ > S'S, where S' > is a constant. 

Proof For any given 7 > 0, there exists a 7-net V = 
{vi,...,vn} of cardinality less than (1 + |)™120|. Since 
each row of H has m i.i.d iV(0, 1) entries, elements of Hvj, 
1 < J < are (not independent) A^(0, 1) entries. Applying 
a union bound on the size of 7-net, Lemmas [2.4l and l23] implv 
that for every vj E V, for some S > and for any constant 
e > 0, with probability 1 — 26^"^" for some c > 0, 

\\{Hvj)k\\i < S 

{l-e)S< \\Hvj\\i < {l + e)S 

hold simultaneously for every vector Vj in V. 

For any z such that ||z||2 = 1, there exists a point vq (we 
change the subscript numbering for V to index the order) in 
V such that \\z — vo\\2 = ji < Let zi denote z — vq, then 
\\zi — 71^1 II2 — 72 < 7i7 < 7^ for some vi in V. Repeating 
this process, we have z — J2i>o "^j^J' where 70 = 1, 7j < 7-' 



and Vj e V. 



Thus for any z e i?™, z = lkl|2 X]j>o Ti'^i- ^^'^ ™y 
index set K with < /3n, 



{Hz),\ = \\zhJ2\{Y^j,Hv,),\ 

ieK j>0 

< 11^112 EE 

= IklbE^'EK^^^-)^! 

(l-<S)(l + e) 



< S\\z\\, 



{2-<5)(l-7) 



J2\{Hz),\ = ii^ibEKE^^-^^^)''! 

i j>0 

> \\zhY.{\iHvoh\~Y.^,mv,),\) 

i j>i 

> iizii2(Ei(^^o).i-E7'Ei(^"^)»i) 

> \\zUil-e)S-J2^^1 + e)S) 

> 5||z|l2(l-6-4^)- 



1-7 



So E E I(i?4.l>5|l^ll2(i-e-^ 



7(l+e) 
7 



2 |'2-i5)(i-7) ) ■ P^'" ^ given S, we can pick 7 and e small enough 
such that E \iHz),\ - E K^^)*! > ^'S\\z\\2, satisfying 

the condition in Theoreml2.2l ■ 



If we do not require £1 minimization to correct k outliers 
over different supports, the fraction of outliers that are cor- 
rectable can go to 1. 

Theorem 2.6 Take an arbitrary constant < /? < 1 and let 

y = i/x + e, where H is a Toeplitz matrix with Gaussian 
elements as defined earlier and e is a vector with k = j3n 
nonzero elements. When n 00, x can be recovered per- 
fectly using £1 minimization from e with k < /3n sparse errors 
with high probability. 

3. WITH BOTH OUTLIERS AND OBSERVATION 
NOISES 

We further consider Toeplitz matrix based system identifica- 
tion when both outliers and random observation errors are 
present, namely, the observation y ~ Hx + e + w, where 
e is a sparse error with no more than k nonzero elements and 
w is the vector of additive observation noises. We can show 
that error ||x — x||2 goes to even when there are both outliers 
and random observation errors under mild conditions, where 
X is the solution to (IL3b . 

Theorem 3.1 Let m be a fixed positive integer and H be an 
n X m Toeplitz matrix (m < n) in U.lt with each element hi, 
— TO + 2 < i < n, being i.i.d. N(0, 1) Gaussian random vari- 
ables. Suppose y — Hx + e + w, where e is a sparse vector 
with k < /3n nonzero elements ( /3 < 1 is a constant) and w 
is the observation noise vector For any constant t > 0, we 
assume that, with high probability as n ^ 00, at least a{t)n 
(where a{t) > is a constant depending on t ) elements in 
w + e are no bigger than t in amplitude. Then ||x — x||2 
with high probability as n ^ 00, where x is the solution to 

(O. 



Proof ||y — i?x|| 1 can be written as ||i/(x — x) + e + wj| 1. 
We argue that for any constant t > 0, with high probability as 
n 0, for all x such that ||x—x|| = t, ||i7(x— x)+e+w|| 1 > 
|je + w||i, contradicting to x being the solution to il.3i . 

To see this, we cover the sphere Z = {z|||2;||2 — 1} with 
a 7-net V. We first argue that for every discrete point tvj with 
Vj from the 7-net, \\HtVj -|- e + w||i > ||e + w||i; and then 
extend the result to the set tZ. 

Let us denote g{h, t) = \\Htvj +e + w\\i — ||e + w||i = 
T,7=ii\h + t{Hvj),\ - \k\), where k = (e + w), for 1 < 
i < n. We note that {Hvj)i is a Gaussian random variable 
-/V(0, 1). Let X be a Gaussian random variable iV(0, cr^), then 
for an arbitrary Z, 

E{\l + tX\-\l\} 

2 m+^f , 

xe ^t'-'"'-' ax 



27rtcr Jo 

'^tae^'^ - 2\l\{l - $( — )), 
TT ta 



which is a decreasing nonnegative function in 

-2|Z,|(1-$(M))). When 



From this, 

E{g{h,t)}^Y.U{^f 

\l\ < tandcr = l,E{\l + tX\ - \l\} = \[^te~^ -2|/|(1- 
$(1)) > 0.1666i. It is also not hai'd to verify that |.g(a, t) - 
g{b,t)\ < J27=i^V^\'^i-bi\ < t^/rnn\\ai-bi\\2, Sind g{h,t) 
has a Lipschitz constant (for h) no bigger than than ty/mn. 

Then by concentration of measure phenomenon for Gaus- 
sian random variables (see Il2ni20l ). 




Fig. 1: Recoverable fraction of errors versus m 



P{g{h,t)<0) 

^^ gih,t) ^ E{g{h,t)} ^ 



E{g{h,t)} 



< 2e" 



If there exists a constant a{t) such that, as n — >■ cxi, at least 
a{t)n elements have magnitudes smaller than t, then the nu- 
merator in B behaves as Q{n^) and the corresponding prob- 
ability P{g{h,t) < 0) behaves as 2e~^^"\ This is because 

when \l\ < t, ^te'^ - 2|/|(1 - > 0.1666<. 

By the same reasoning, g(h, t) < en holds with probabil- 
ity no more than e^'^'-" for each discrete point from the 7-net 
tV, where e > is a sufficiently small constant and C5 > 
is a constant which may depend on e. Since there are at most 
(1 + -)™ points from the 7-net, by a simple union bound, 
with probability 1 — e^"^"" as n 00, g{h,t) > en holds 
for all points from the 7-net tV, where cg > is a constant 
and 7 can be taken as an arbitrarily small constant. Following 
similar 7-net proof techniques for Lemmas 12.31 12.41 and 12.51 
if we choose a sufficiently small constant e > and accord- 
ingly a sufficiently small constant 7 > 0, g{h,t) > 0.5en 
holds simultaneously for every point in the set tZ with high 
probability 1 — e^"^^", where cy > is a constant. 

Notice if (7 > Ofort = ti, then necessarily ^(ft,, i) > 
for t — t2 > ti. This is because g{h, t) is a convex function 
in < > and g(/i,0) = 0. So if g{h,t) > 0.5en > holds 
with high probability for every point tZ, necessarily ||x — 
x|| 2 < t, because x minimizes the objective in ( 11.3b . Because 
we can pick t to be arbitrarily small, ||x — x|| 2 with high 
probability as n — > 00. ■ 



We remark that the mild conditions in Theorem 13.11 are 
satisfied easily if /3 < 1 and the elements in w are i.i.d. ran- 
dom variables following a probability density function /(s) 
that is not "vanishing" around s — (namely the cumulative 
distribution function F{t) > for any < > 0. /(O) can be 
sometimes). For example, Gaussian distribution, exponen- 
tial distributions, and Gamma distributions for w all satisfy 
such conditions in Theorem l3.1l This greatly broadens the re- 
sults in [2 1, which requires /(O) > and does not accommo- 
date outliers. Compared with analysis in compressed sensing 
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Fig. 2: With outliers and noises of different distributions 



|[8][T2l, this result is for Toeplitz matrix in error correction and 
applies to observation noises with non-Gaussian distributions. 



4. NUMERICAL EVALUATIONS 



Based on Theorem 12. II we calculate the strong thresholds in 
Figure [T] for different values of m by optimizing over /i > 
and S. As m increases, the correlation length in the matrix 
H also increases and the corresponding correctable number 
of errors decreases (but always exists). We then evaluate in 
Figure[2]the £2-norm error ||x — x||2 of £1 minimization for 
Gaussian Toeplitz matrices under both outliers and i.i.d. ob- 
servation noises of different probability distributions: Gamma 
distribution with shape parameter fc = 2 and scale stan- 
dard Gaussian distribution N{0, 1) and exponential distribu- 
tion with mean These distributions are chosen such that 
the observation noises have the same expected energy. The 
system parameter m is set to 5 and the system state x are 
generated as i.i.d. standard Gaussian random variables. We 
randomly pick ^ i.i.d. N(0, 100) Gaussian outliers with ran- 
dom support for the error vector e. For all these distributions, 
the average error goes to (we also verified points beyond 
n > 1000). What is interesting is that the error goes to 
at different rates. Actually, as hinted by the proof of Theo- 
rem [TT] the Gamma distribution has the worst performance 
because its probability density function is smaller around the 
origin (actually at the origin), while the exponential distri- 
bution has the largest probability density function around 0. 
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